Improved tone modeling for Mandarin broadcast news speech recognition
نویسندگان
چکیده
Tone has a crucial role in Mandarin speech in distinguishing ambiguous words. Most state-of-the-art Mandarin automatic speech recognition systems adopt embedded tone modeling, where tonal acoustic units are used and F0 features are appended to the spectral feature vector. In this paper, we combine the embedded aproach (using improved F0 smoothing) with explicit tone modeling in rescoring the output lattices. Oracle experiments indicate 32% relative improvement can be achieved by rescoring with perfect tone information. Recognition experiments on Mandarin broadcast news show that, even with an accuracy of only 70%, the explicit tone classifier offers complementary knowledge and improves performance significantly. Through the combination of tone modeling techniques, the character error rate on the CTV test set can be improved from 13.0% to 11.5%.
منابع مشابه
Unsupervised and Semi-supervised Learning of Tone and Pitch Accent
Recognition of tone and intonation is essential for speech recognition and language understanding. However, most approaches to this recognition task have relied upon extensive collections of manually tagged data obtained at substantial time and financial cost. In this paper, we explore two approaches to tone learning with substantially reductions in training data. We employ both unsupervised cl...
متن کاملUnsupervised Learning of Tone and Pitch Accent
Recognition of tone and intonation is essential for speech recognition and language understanding. However, most approaches to this recognition task have relied upon extensive collections of manually tagged data obtained at substantial time and financial cost. In this paper, we explore unsupervised clustering approaches to recognize pitch accent in English and tones in Mandarin Chinese. In unsu...
متن کاملImproved Tonal Language Speech Recognition by Integrating Spectro-Temporal Evidence and Pitch Information with Properly Chosen Tonal Acoustic Units
We propose an improved Tandem system for tonal language speech recognition. Three different types of features, cepstral, spectro-temporal and pitch features, are integrated for modeling tone and phoneme variation simultaneously. Tonal phonemes (or tonemes) are used for MLP posterior estimation, and tonal acoustic units for HMM recognition. In our experiments conducted on Mandarin broadcast news...
متن کاملVoice retrieval of Mandarin broadcast news speech
This paper presents an improved framework for voice retrieval of Mandarin broadcast news speech. First, several unsupervised and data-driven approaches for broadcast news transcription were proposed to improve the speech recognition accuracy and efficiency. Then, a multiscale indexing paradigm for broadcast news retrieval was exploited to alleviate the problems caused by the speech recognition ...
متن کاملBroadcast news transcription in Mandarin
In this paper, our work in developing a Mandarin broadcast news transcription system is described. The main focus of this work is a port of the LIMSI American English broadcast news transcription system to the Chinese Mandarin language. The system consists of an audio partitioner and an HMM-based continuous speech recognizer. The acoustic models were trained on about 24 hours of data from the 1...
متن کامل